Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 7 evolving registration approaches and 4 traditional registration approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.
translated by 谷歌翻译
现有检测方法通常使用参数化边界框(Bbox)进行建模和检测(水平)对象,并将其他旋转角参数用于旋转对象。我们认为,这种机制在建立有效的旋转检测回归损失方面具有根本的局限性,尤其是对于高精度检测而言,高精度检测(例如0.75)。取而代之的是,我们建议将旋转的对象建模为高斯分布。一个直接的优势是,我们关于两个高斯人之间距离的新回归损失,例如kullback-leibler Divergence(KLD)可以很好地对齐实际检测性能度量标准,这在现有方法中无法很好地解决。此外,两个瓶颈,即边界不连续性和正方形的问题也消失了。我们还提出了一种有效的基于高斯度量的标签分配策略,以进一步提高性能。有趣的是,通过在基于高斯的KLD损失下分析Bbox参数的梯度,我们表明这些参数通过可解释的物理意义进行了动态更新,这有助于解释我们方法的有效性,尤其是对于高精度检测。我们使用量身定制的算法设计将方法从2-D扩展到3-D,以处理标题估计,并在十二个公共数据集(2-D/3-D,空中/文本/脸部图像)上进行了各种基本检测器的实验结果。展示其优越性。
translated by 谷歌翻译
在视频中检测动作已被广泛应用于设备应用程序。实用的设备视频始终没有动作和背景。希望既可以识别动作类别又定位动作发生的时间位置。这样的任务称为“时间动作位置”(TAL),该位置总是在收集和标记多个未修剪视频的云上训练。希望TAL模型不断地从新数据中学习,这可以直接提高动作检测精度,同时保护客户的隐私。但是,训练TAL模型是不平凡的,因为需要具有时间注释的大量视频样本。但是,逐帧的注释视频非常耗时且昂贵。尽管已经提出了仅使用视频级标签的未修剪视频来学习弱监督的TAL(W-TAL),但这种方法也不适合在设备学习方案中。在实用的设备学习应用中,在流中收集数据。将如此长的视频流分为多个视频片段需要大量的人为努力,这阻碍了将TAL任务应用于现实的设备学习应用程序的探索。为了使W-TAL模型能够从长时间的未修剪流视频中学习,我们提出了一种有效的视频学习方法,可以直接适应新的环境。我们首先提出了一种自适应视频划分方法,采用基于对比分数的段合并方法将视频流转换为多个段。然后,我们探索TAL任务上的不同采样策略,以要求尽可能少的标签。据我们所知,我们是直接从设备的长视频流中学习的首次尝试。
translated by 谷歌翻译
边缘计算是加速机器学习算法支持移动设备的流行目标,而无需通信潜伏在云中处理它们。机器学习的边缘部署主要考虑传统问题,例如其安装的交换约束(尺寸,重量和功率)。但是,考虑到体现能量和碳的重要贡献,这种指标不足以考虑计算的环境影响。在本文中,我们探讨了用于推理和在线培训的卷积神经网络加速引擎的权衡。特别是,我们探讨了内存处理(PIM)方法,移动GPU加速器以及最近发布的FPGA的使用,并将它们与新颖的赛车记忆PIM进行比较。用赛车记忆PIM替换支持PIM的DDR3可以恢复其体现的能量,以至于1年。对于高活动比,与支持PIM的赛车记忆相比,移动GPU可以更可持续,但具有更高的体现能量可以克服。
translated by 谷歌翻译
多视图点云注册在3D重建中至关重要。由于从不同角度捕获的点云之间存在密切的连接,因此如果正确利用这些连接,则可以增强注册性能。因此,本文将注册问题建模为多任务优化,并提出了一种新颖的双通道知识共享机制,以有效,有效地解决问题。多视点云注册作为多任务优化的建模是双重的。通过同时考虑两个点云的局部精度以及所涉及的所有点云带来的全局一致性,得出了具有自适应阈值的健身函数。还定义了共同进化搜索过程的框架,以同时优化属于相关任务的多个健身函数。为了提高解决方案质量和收敛速度,拟议的双通道知识共享机制发挥了作用。任务内的知识共享引入了求解更简单的帮助任务,并且在辅助任务和原始任务上共享有用的信息,从而加速了搜索过程。任务间知识共享探讨了原始任务中埋葬的共同点,旨在防止任务陷入本地Optima。在模型对象以及场景点云上进行的综合实验显示了所提出的方法的功效。
translated by 谷歌翻译
第六版的AI城市挑战赛特别关注了两个领域的问题,在计算机视觉和人工智能的交集中具有巨大的解锁潜力:智能交通系统(ITS),以及实体和砂浆零售业务。 2022年AI City Challenge的四个挑战赛收到了来自27个国家 /地区254个团队的参与请求。轨道1地址的城市规模多目标多摄像机(MTMC)车辆跟踪。轨道2地址为基于天然语言的车辆轨道检索。 Track 3是一条全新的自然主义驾驶分析的轨道,该轨道是由安装在车辆内部的几台相机捕获的,该摄像头专注于驾驶员安全,而任务是对驾驶员的操作进行分类。 Track 4是另一个旨在仅使用单个视图摄像头实现零售商店自动结帐的新轨道。我们发布了两个基于不同方法的领导董事会成员提交,包括比赛的公共负责人委员会,不允许使用外部数据,以及用于所有提交结果的总管委员会。参与团队的最高表现建立了强大的基线,甚至超过了拟议的挑战赛中的最先进。
translated by 谷歌翻译
以前的周期 - 一致性对应学习方法通​​常利用图像补丁进行培训。在本文中,我们介绍了一种完全卷积的方法,它对推理过程更简单,更加连贯。在直接应用全面卷积训练的同时,在模型崩溃中,我们研究了这种崩溃现象背后的下划线原因,表明像素的绝对位置提供了易于完成循环一致的快捷方式,这阻碍了有意义的视觉表现的学习。为了打破这种绝对的位置捷径,我们建议将不同的作物应用于前向和后向框架,并采用特征翘曲来建立相同框架两种作物之间的对应关系。前者技术在前后跟踪处强制执行相应的像素以具有不同的绝对位置,并且后者有效地阻止前后轨道之间的快捷方式。在三个标签传播基准台上进行姿势跟踪,面部地标跟踪和视频对象分割,我们的方法在很大程度上提高了香草完全卷积循环一致性方法的结果,与自我监督最先进的方法相比,实现了非常竞争力的表现。我们的培训模型和代码可用于\ url {https://github.com/steve-tod/stfc3}。
translated by 谷歌翻译
Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping (TAG) is devised to generate high quality action proposals. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms previous state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling actions with various temporal structures. 1
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译